Variational Bayes for Merging Noisy Databases

نویسندگان

  • Tamara Broderick
  • Rebecca C. Steorts
چکیده

Bayesian entity resolution merges together multiple, noisy databases and returns the minimal collection of unique individuals represented, together with their true, latent record values. Bayesian methods allow flexible generative models that share power across databases as well as principled quantification of uncertainty for queries of the final, resolved database. However, existing Bayesian methods for entity resolution use Markov monte Carlo method (MCMC) approximations and are too slow to run on modern databases containing millions or billions of records. Instead, we propose applying variational approximations to allow scalable Bayesian inference in these models. We derive a coordinate-ascent approximation for meanfield variational Bayes, qualitatively compare our algorithm to existing methods, note unique challenges for inference that arise from the expected distribution of cluster sizes in entity resolution, and discuss directions for future work in this domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Variational Em Algorithm for On-line Identification of Extended Ar Models

The AutoRegressive (AR) model is extended to cope with a wide class of possible transformations and degradations. The Variational Bayes (VB) procedure is used to restore conjugacy. The resulting Bayesian recursive identification procedure has many of the desirable computational properties of the classical RLS procedure. During each time-step, an iterative Variational EM (VEM) procedure is requi...

متن کامل

A Filtering Approach to Stochastic Variational Inference

Stochastic variational inference (SVI) uses stochastic optimization to scale up Bayesian computation to massive data. We present an alternative perspective on SVI as approximate parallel coordinate ascent. SVI trades-off bias and variance to step close to the unknown true coordinate optimum given by batch variational Bayes (VB). We define a model to automate this process. The model infers the l...

متن کامل

A low-cost variational-Bayes technique for merging mixtures of probabilistic principal component analyzers

Mixtures of probabilistic principal component analyzers (MPPCA) have shown effective for modeling high-dimensional data sets living on nonlinear manifolds. Briefly stated, they conduct mixture model estimation and dimensionality reduction through a single process. This paper makes two contributions: first, we disclose a Bayesian technique for estimating such mixture models. Then, assuming sever...

متن کامل

An Alternative View of Variational Bayes and Minimum Variational Stochastic Complexity

Bayesian learning is widely used in many applied datamodelling problems and is often accompanied with approximation schemes since it requires intractable computation of the posterior distributions. In this study, we focus on the two approximation methods, the variational Bayes and the local variational approximation. We show that the variational Bayes approach for statistical models with latent...

متن کامل

Variational Bayes phase tracking for correlated dual-frequency measurements with slow dynamics

We consider the problem of estimating the absolute phase of a noisy signal when this latter consists of correlated dual-frequency measurements. This scenario may arise in many application areas such as global navigation satellite system (GNSS). In this paper, we assume a slow varying phase and propose accordingly a Bayesian filtering technique that makes use of the frequency diversity. More spe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014